32 research outputs found
Multivariate Covariance Generalized Linear Models
We propose a general framework for non-normal multivariate data analysis
called multivariate covariance generalized linear models (McGLMs), designed to
handle multivariate response variables, along with a wide range of temporal and
spatial correlation structures defined in terms of a covariance link function
combined with a matrix linear predictor involving known matrices. The method is
motivated by three data examples that are not easily handled by existing
methods. The first example concerns multivariate count data, the second
involves response variables of mixed types, combined with repeated measures and
longitudinal structures, and the third involves a spatio-temporal analysis of
rainfall data. The models take non-normality into account in the conventional
way by means of a variance function, and the mean structure is modelled by
means of a link function and a linear predictor. The models are fitted using an
efficient Newton scoring algorithm based on quasi-likelihood and Pearson
estimating functions, using only second-moment assumptions. This provides a
unified approach to a wide variety of different types of response variables and
covariance structures, including multivariate extensions of repeated measures,
time series, longitudinal, spatial and spatio-temporal structures.Comment: 21 pages, 5 figure
Hypothesis tests for multiple responses regression models in R: The htmcglm Package
This article describes the R package htmcglm implemented for performing
hypothesis tests on regression and dispersion parameters of multivariate
covariance generalized linear models (McGLMs). McGLMs provide a general
statistical modeling framework for normal and non-normal multivariate data
analysis along with a wide range of correlation structures. The proposed
package considers the Wald statistics to perform general hypothesis tests and
build tailored ANOVAs, MANOVAs and multiple comparison tests. The goal of the
package is to provide tools to improve the interpretation of regression and
dispersion parameters. We assess the effects of the covariates on the response
variables by testing the regression coefficients. Similarly, we perform tests
on the dispersion coefficients in order to assess the correlation between study
units. It could be of interest in situations where the data observations are
correlated with each other, such as in longitudinal, times series, spatial and
repeated measures studies. The htmcglm package provides a user friendly
interface to perform MANOVA like tests as well as multivariate hypothesis tests
for models of the mcglm class. We describe the package implementation and
illustrate it through the analysis of two data sets. The first deals with an
experiment on soybean yield; the problem has three response variables of
different types (continuous, counting and binomial) and three explanatory
variables (amount of water, fertilization and block). The second dataset
addresses a problem where responses are longitudinal bivariate counts of
hunting animals; the explanatory variables used are the hunting method and sex
of the animal. With these examples we were able to illustrate several tests in
which the proposal proves to be useful for the evaluation of regression and
dispersion parameters both in problems with dependent or independent
observations.Comment: arXiv admin note: substantial text overlap with arXiv:2208.0002
Adaptação transcultural e validação do instrumento Conditions of Work Effectiveness - Questionnaire-II
OBJECTIVE: This study aims at translating and validating the content of the instrument Conditions of Work Effectiveness - Questionnaire-II (CWEQ-II), developed by Laschinger, Finegan, Shamian and Wilk, modified from the original CWEQ for the Brazilian culture. METHOD: the methodological procedure consisted of the stages of translation of the instrument into the Portuguese language; back-translation; semantic, idiomatic and cultural equivalence and tests of the final version. The instrument in the Portuguese version was applied to a group of 40 nurses in two hospitals. RESULTS: the data resulted in a Cronbach's Alpha of 0.86 for the first hospital and 0.88 for the second one. The results of the factorial analysis are considered sufficiently satisfactory. CONCLUSION: It is to conclude that the instrument can be used in Brazil.OBJETIVO: este estudio tuvo como objetivo traducir y validar el contenido del instrumento Conditions of Work Effectiveness - Questionnaire-II (CWEQ-II), desarrollado por Laschinger, Finegan, Shamian y Wilk y modificado del original CWEQ, para la cultura brasileña. MÉTODO: el procedimiento metodológico se constituye de las etapas de traducción del instrumento para la lengua portuguesa; back-translation; equivalencia semántica, idiomática y cultural y pruebas de la versión final. El instrumento en la versión en portugués fue aplicado a un grupo de 40 enfermeras, en dos hospitales. RESULTADOS: los datos resultaron en Alfa de Cronbach en 0,86 para el primer hospital y 0,88 para el segundo. Los resultados del análisis de los factores son considerados muy satisfactorios. CONCLUSIÓN: se concluye que el instrumento puede ser utilizado en Brasil.OBJETIVO: este estudo teve como objetivo traduzir e validar o conteúdo do instrumento Conditions of Work Effectiveness - Questionnaire-II, desenvolvido por Laschinger, Finegan, Shamian e Wilk e modificado do original Conditions Work Effectiveness - Questionnaire, para a cultura brasileira. MÉTODO: o procedimento metodológico constituiu-se das etapas de tradução do instrumento para a língua portuguesa; back-translation; equivalência semântica, idiomática e cultural e testes da versão final. O instrumento na versão em português foi aplicado a um grupo de 40 enfermeiras, em dois hospitais. RESULTADOS: os dados resultaram em alfa de Cronbach em 0,86 para o primeiro hospital e 0,88 para o segundo. Os resultados da análise fatorial são considerados bastante satisfatórios. CONCLUSÃO: conclui-se que o instrumento pode ser utilizado no Brasil
Ciência de Dados: uma descrição dos primeiros cursos de graduação em universidades brasileiras
Due to the increasing volume of data, the urgency to look for suitably qualified data scientists has grown. Thus, Brazilian Higher Education Institutions (HEIs) have tried to answer this demand. In this scenario, the objective of this paper is to perform a characterization of undergraduate courses in Data Science. Thus, we aim to answer questions such as: have the courses been offered in the vast majority by public or private universities? When did they start being offered? Are they usually ODL (Online Distance Learning or in-person? Are they the technological type or baccalaureate? What groups of disciplines most make up the curriculum? In which regions of the country are they concentrated? How is the offer of vacancies and what is the profile of admissions in bachelor and technological courses? For this, the e-MEC databases and the 2021 Higher Education Census were combined, and it was decided to explore and visualize data using the MCA technique. Among the results, it is observed that there is a certain balance between the in-person and online learning modalities, in addition to the fact that most of the courses are of the technological type and are usually offered by private HEIs. Regarding the regions, a significant number of in-person undergraduate courses are concentrated in the Southeast region of Brazil.Devido ao aumento de volume de dados, a urgência na busca de cientistas de dados devidamente qualificados têm crescido. Desta forma, as Instituições de Ensino Superior (IES) brasileiras têm buscado suprir tal demanda. Neste enredo, o objetivo deste artigo é realizar uma caracterização dos cursos de graduação em Ciência de Dados. Assim, buscou-se responder questionamentos como: os cursos têm sido ofertados em grande maioria pelas universidades públicas ou privadas? Quando começaram a ser ofertados? Costumam ser EAD (ensino à distância) ou presenciais? São do tipo tecnológico ou bacharelado? Quais grupos de disciplinas mais compõem a grade? Em quais regiões do país se concentram? Como é a oferta de vagas e qual é o perfil de ingressos em cursos do tipo bacharelado e tecnológico? Para isso, utilizou-se a junção das bases do e-MEC e do Censo da Educação Superior de 2021 e optou-se por fazer a exploração e visualização de dados considerando a técnica ACM. Entre os resultados, observa-se que há um certo equilíbrio entre as modalidades presencial e EAD, além de que em grande parte os cursos são do tipo tecnológico e costumam ser ofertados por IES privadas. Acerca das regiões, nota-se uma grande concentração de cursos presenciais na região Sudeste do Brasil
MODELAGEM MARGINAL CONJUNTA DA ALTURA E VOLUME PARA Araucaria angustifolia
Variáveis mensuradas em florestas normalmente apresentam algum grau de correlação. Logo, ajustar modelos para estimar variáveis biométricas de forma independente não é a abordagem mais adequada. Assim, modelos multivariados ganham relevância devido à capacidade de quantificar associações entre variáveis respostas. Nesse contexto, o objetivo da presente pesquisa foi ajustar modelos lineares generalizados de covariância multivariada (MCGLMs) univariados e multivariados para estimar altura e volume de árvores. As variáveis altura ( ), volume ( ) e diâmetro ( ) foram coletadas da Araucaria angustifolia em floresta nativa, localizada no estado de Santa Catarina, Brasil. Os MCGLMs foram ajustados para estimar e , em abordagem univariada e multivariada. O preditor linear dos modelos foi fixado previamente em função da covariável , para ambas as variáveis. Devido a um aparente padrão de variância não constante das duas respostas, diferentes estruturas do preditor linear matricial foram testadas, com efeito da covariável variando até polinômio de grau três. Ainda, um parâmetro de potência foi estimado nas duas abordagens, com a finalidade de obter uma função de variância para cada variável. Os parâmetros estimados nas abordagens univariadas e multivariadas foram similares. Em geral, o erro padrão dos parâmetros foi menor para os modelos multivariados, sendo consequência da correlação entre as variáveis respostas. Os resultados também sugeriram que uma função de variância Poisson-Gama composta é adequada para variável , bem como uma função constante para variável . O modelo mais adequado foi obtido com preditor linear matricial somente em função de um parâmetro de dispersão associado a uma matriz identidade
Multivariate Generalized Linear Mixed Models for Count Data
Univariate regression models have rich literature for counting data. However, this is not the case for multivariate count data. Therefore, we present the Multivariate Generalized Linear Mixed Models framework that deals with a multivariate set of responses, measuring the correlation between them through random effects that follows a multivariate normal distribution. This model is based on a GLMM with a random intercept and the estimation process remains the same as a standard GLMM with random effects integrated out via Laplace approximation. We efficiently implemented this model through the TMB package available in R. We used Poisson, negative binomial (NB), and COM-Poisson distributions. To assess the estimator properties, we conducted a simulation study considering four different sample sizes and three different correlation values for each distribution. We achieved unbiased and consistent estimators for Poisson and NB distributions; for COM-Poisson estimators were consistent, but biased, especially for dispersion, variance, and correlation parameter estimators. These models were applied to two datasets.
The first concerns a sample from 30 different sites collected in Australia where the number of times each one of the 41 different ant species was registered; which results in an impressive 820 variance-covariance and 41 dispersion parameters are estimated simultaneously, let alone the regression parameters. The second is from the Australia Health Survey with 5 response variables and 5190 respondents. These datasets can be considered overdispersed by the generalized dispersion index. The COM-Poisson model overcame the other two competitors considering three goodness-of-fit indexes, AIC, BIC, and maximized log-likelihood values. As a result, it estimated parameters with smaller standard errors and a greater number of significant correlation coefficients. Therefore, the proposed model is capable of dealing with multivariate count data, either under- equi- or overdispersed responses, and measuring any kind of correlation between them taking into account the effects of the covariates